{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python3",
   "display_name": "DataAnalysis"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tools import *\n",
    "%matplotlib inline"
   ]
  },
  {
   "source": [
    "# Chap 3 处理原始文本\n",
    "1.  如何访问文件内的文本？\n",
    "2.  如何将文档分割成单独的单词和标点符号，从而进行文本语料上的分析？\n",
    "3.  如何产生格式化的输出，并把结果保存在文件中？"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "## 3.2 字符串：底层的文本处理"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "### 3.2.1 字符串的基本操作"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "monty=  Monty Python!\ncircus=  Monty Python's Flying Circus\ncircus=  Monty Python's Flying Circus\n"
     ]
    }
   ],
   "source": [
    "monty = 'Monty Python!'\n",
    "print('monty= ', monty)\n",
    "\n",
    "circus = \"Monty Python's Flying Circus\"\n",
    "print('circus= ', circus)\n",
    "\n",
    "circus = 'Monty Python\\'s Flying Circus'\n",
    "print('circus= ', circus)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "couplet=  Shall I compare thee to a Summer's day?Thou are more lovely and more temperate:\ncouplet=  Shall I compare thee to a Summer's day?Thou are more lovely and more temperate:\ncouplet=  Shall I compare thee to a Summer's day?Thou are more lovely and more temperate:\ncouplet=  Shall I compare thee to a Summer's day?\nThou are more lovely and more temperate:\ncouplet=  Shall I compare thee to a Summer's day?\nThou are more lovely and more temperate:\n"
     ]
    }
   ],
   "source": [
    "couplet = \"Shall I compare thee to a Summer's day?\" \\\n",
    "          \"Thou are more lovely and more temperate:\"\n",
    "print('couplet= ', couplet)\n",
    "\n",
    "couplet = \"Shall I compare thee to a Summer's day?\" +\\\n",
    "          \"Thou are more lovely and more temperate:\"\n",
    "print('couplet= ', couplet)\n",
    "\n",
    "couplet = (\"Shall I compare thee to a Summer's day?\"\n",
    "           \"Thou are more lovely and more temperate:\")\n",
    "print('couplet= ', couplet)\n",
    "\n",
    "couplet = \"\"\"Shall I compare thee to a Summer's day?\n",
    "Thou are more lovely and more temperate:\"\"\"\n",
    "print('couplet= ', couplet)\n",
    "\n",
    "couplet = '''Shall I compare thee to a Summer's day?\n",
    "Thou are more lovely and more temperate:'''\n",
    "print('couplet= ', couplet)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "couplet=  Rough winds do shake the darling buds of May,And Summer's lease hath all too short a date:\ncouplet=  Rough winds do shake the darling buds of May,\nAnd Summer's lease hath all too short a date:\n"
     ]
    }
   ],
   "source": [
    "couplet = (\"Rough winds do shake the darling buds of May,\"\n",
    "           \"And Summer's lease hath all too short a date:\")\n",
    "print('couplet= ', couplet)\n",
    "\n",
    "couplet = '''Rough winds do shake the darling buds of May,\n",
    "And Summer's lease hath all too short a date:'''\n",
    "print('couplet= ', couplet)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "a=  [1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1]\nb=  ['            very', '          veryvery', '        veryveryvery', '      veryveryveryvery', '    veryveryveryveryvery', '  veryveryveryveryveryvery', 'veryveryveryveryveryveryvery', '  veryveryveryveryveryvery', '    veryveryveryveryvery', '      veryveryveryvery', '        veryveryvery', '          veryvery', '            very']\n"
     ]
    }
   ],
   "source": [
    "a = [1, 2, 3, 4, 5, 6, 7, 6, 5, 4, 3, 2, 1]\n",
    "print('a= ', a)\n",
    "\n",
    "b = [' ' * 2 * (7 - i) + 'very' * i for i in a]\n",
    "print('b= ', b)"
   ]
  },
  {
   "source": [
    "### 3.2.2 字符串输出"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Monty Python!\nMonty Python!Holy Grail\nMonty Python! Holy Grail\nMonty Python! and the Holy Grail\n"
     ]
    }
   ],
   "source": [
    "print(monty)\n",
    "\n",
    "grail = 'Holy Grail'\n",
    "print(monty + grail)\n",
    "print(monty, grail)\n",
    "print(monty, 'and the', grail)"
   ]
  },
  {
   "source": [
    "### 3.2.3 访问字符串中的单个字符"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "monty[0]=  M\n",
      "monty[3]=  t\n",
      "monty[5]=   \n",
      "monty[len(monty) - 1]=  !\n",
      "monty[-1]=  !\n",
      "monty[-7]=  P\n"
     ]
    }
   ],
   "source": [
    "print(\"monty[0]= \", monty[0])\n",
    "print(\"monty[3]= \", monty[3])\n",
    "print(\"monty[5]= \", monty[5])\n",
    "print(\"monty[len(monty) - 1]= \", monty[len(monty) - 1])\n",
    "\n",
    "# 超出字符串的索引\n",
    "# print(\"monty[20]= \", monty[20])\n",
    "\n",
    "print(\"monty[-1]= \", monty[-1])\n",
    "print(\"monty[-7]= \", monty[-7])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "c o l o r l e s s   g r e e n   i d e a s   s l e e p   f u r i o u s l y "
     ]
    }
   ],
   "source": [
    "sent = 'colorless green ideas sleep furiously'\n",
    "for char in sent:\n",
    "    print(char, end=' ')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "fdist.most_common(5)=  [('e', 117092), ('t', 87996), ('a', 77916), ('o', 69326), ('n', 65617)]\nchar_list=  ['e', 't', 'a', 'o', 'n', 'i', 's', 'h', 'r', 'l', 'd', 'u', 'm', 'c', 'w', 'f', 'g', 'p', 'b', 'y', 'v', 'k', 'q', 'j', 'x', 'z']\n"
     ]
    },
    {
     "output_type": "display_data",
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEECAYAAADgYandAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8XGXZ//HPlcneJulKmy50AwpdoDShhQoUqhRFqCDgg4Ba8LGiiPj4+AAuSFUQEVDhByqibMqioICtIFC6gW1p01LaUrpvQOmatkmzL9fvj5mEENLMZJklyff9ep3XzJxc59zXSSZzzVnu+5i7IyIi0h6S4p2AiIh0HioqIiLSblRURESk3aioiIhIu1FRERGRdqOiIiIi7UZFRURE2o2KioiItBsVFRERaTfJ8U4g1vr06eNDhw5t1bJlZWVkZGREdZmuFp+IOWmb4x+fiDklWnys2qizfPnyfe7eN2ygu3epKS8vz1uroKAg6st0tfhYtJFo8bFoo6PHx6KNjh4fqzbqAAUewWesDn+JiEi7iUpRMbN0M/uWmW0zs+mheeeb2XwzW25mt4bmBczsfjNbYGaLzGxsaP4EM1toZovN7K6WxoqISHxE65xKP6ACeLzBvFXAOUAtsNXM7gamAsnuPtnMpgB18x4ELnD3HaGCMQkYHGmsuy+K0naJiEgzzKM49L2ZzQS2ufsjDeb1AJYCo4G7gFeBncDNwLjQ9CowEbgVmAw8CQyPNNbd72mUxwxgBkBubm7erFmzWrU9paWlZGZmRnWZrhafiDlpm+Mfn4g5JVp8rNqok5+fv9zd88MGRnLipbUTMBOY3uB1N+AF4FOh1/cCDwM/BFKBTUAvYC3wAHAycB1wfUtim8tJJ+oTKz4WbSRafCza6OjxsWijo8fHqo06JNqJejPrAzwL3OHuc0KzCwB399uA8cA6dy8MFY3bgZXAecCiFsaKiEgcROWcipkNAp4DBgAVoXMgPYFjgZ+YGcAfCB7WmmpmC4Eq4JrQKmYATwPlwCvuvszMVkYaG41t2rC7mNkbSzhqWCmDe7Vu91FEpLOLSlFx9/eA8Mfegq5sYvm5wCmN5lVFGhsN98/bxPMrixk2ZA9fPm1otJsTEemQ1E8lQhOH9QbgjS2Fcc5ERCRxqahEaOLwXgC8sXV/3UUHIiLSiIpKhIb36UaPtCT2Ha5k896SeKcjIpKQVFQiZGaM6psKBPdWRETk41RUWmB0XVHReRURkSapqLTAqL4pgM6riIgciYpKCwzOTqZXt1R2F1WwbX9pvNMREUk4KiotYGZMHBa6CmyLzquIiDSmotJC9UVlq86riIg0pqLSQhOH13WC1HkVEZHGVFRaaGS/LHpkprDzUDnvHSiLdzoiIglFRaWFkpKMU4YGD4Et0XkVEZGPUFFpBZ1XERFpmopKK5xad15FPetFRD5CRaUVTsjNJis9mXcLy3j/oM6riIjUUVFphUCSMWGo+quIiDSmotJK9UPhaxwwEZF6KiqtVH/TLp1XERGpp6LSSqMHZNM9LZlt+0vZXVQe73RERBKCikorJQeSyBvSE1B/FRGROioqbfDhLYZ1XkVEBFRU2qT+vIr2VEREABWVNjlxUA4ZKQE27y1hT7HOq4iIqKi0QUqD8ypLdQhMRERFpa1OVX8VEZF6KiptNFHjgImI1FNRaaMTB+WQlpzEht2HKSypjHc6IiJxFZWiYmbpZvYtM9tmZtND8yaY2UIzW2xmd4XmBczsfjNbYGaLzGxse8TGUlpygPFH151X0d6KiHRt0dpT6QdUAI83mPcgcKW7nwZMMLNJwCVAsrtPBn4E3N1OsTFV119lic6riEgXZ9G8z7qZzQS2Ac8DrwITgVuBycCTwPDQ/J3AzcC40NSmWHe/p1EeM4AZALm5uXmzZs1q1faUlpaSmZn5sflr9lRwy4IDDM1J5u6pfSJapqVtdNb4RMxJ2xz/+ETMKdHiY9VGnfz8/OXunh820N2jNgEzgelAL2At8ABwMnAdcD1wL/Aw8EMgFdjUHrHN5ZSXl+etVVBQ0OT8sspqP/YHL/jQm2b7gZKKiJZpaRudNT4WbSRafCza6OjxsWijo8fHqo06QIFH8LkfkxP17l4YKgS3AyuB84BFQEHwx34bMB5Y106xMZWeEmDc4B64q7+KiHRtydFYqZkNAp4DBgAVZjaF4OGnp4Fy4BV3X2ZmK4GpZrYQqAKuCa2iTbHR2KZwJg7vxdJthbyxtZCpo/vHIwURkbiLSlFx9/eApo69ndIorgq4sonl57YlNh5OHd6b/zd3k/qriEiXpn4q7WT80T1JCRhrdxZRVF4V73REROJCRaWdZKQGOHFQD2odCrbpvIqIdE0qKu1o4jCNAyYiXZuKSjuqGwdsia4AE5EuSkWlHeUN6UkgyVjz/iEOV1THOx0RkZhTUWlH3dOSGTMwh5pa13kVEemSVFTa2anDdN96Eem6VFTa2cT6m3apv4qIdD0qKu0sf2gvkgxWvXeI0kqdVxGRrkVFpZ1lp6cwekAO1bXOiu0H452OiEhMqahEQX1/FQ3ZIiJdjIpKFNTft16dIEWki1FRiYIJQ3thBivfPUhFTfRugiYikmhUVKIgJzOF4/tnU1lTy4b9lfFOR0QkZlRUoqTuvMqKDyrinImISOyoqETJeWNzAXhhYymr3zsU52xERGJDRSVKJgzrxVdOG0K1w/V/fVN9VkSkS1BRiaLvn3cCg7KT2bK3hFv/9U680xERiToVlShKTwnwPxNzSA0k8cQbO3j57V3xTklEJKpUVKJsaI8UbvzM8QDc+PdV7Ckqj3NGIiLRo6ISA1dNGsoZx/bhQGkV//v0W9TWqu+KiHROKioxkJRk3H3pSfTMTOG1jft46D9b452SiEhUqKjEyFHZ6dxx8YkA/PLf61m7syjOGYmItD8VlRiaOro/l088msqaWq5/6k3Kq2rinZKISLtSUYmxmz87iuF9u7Fxz2Fuf0GXGYtI56KiEmMZqQHuvexkUgLGo4u3M2/dnninJCLSblRU4mDMwBy+N3UkAP/3zFvsLdb4YCLSOcSsqJhZupn9zcwWmdkqM/u8mU0ws4VmttjM7grFBczsfjNbEIodG5ofcWxH8LUzhjNpRG/2Ha7khmfewl2XGYtIxxfLPZWJQIW7TwKuBqYDDwJXuvtpwAQzmwRcAiS7+2TgR8DdoeVbEpvwkpKMu79wEjkZKcxbv5c/L9ke75RERNrMYvUN2czSgKeA/cAg4IcEC8VE4FZgMvAkMBx4FdgJ3AyMC02vRhLr7kOaaHsGMAMgNzc3b9asWa3ahtLSUjIzM9t1mcXvlXPX4oOkJsEdn+pNn5SqFrXR0pwSLT4Rc9I2xz8+EXNKtPhYtVEnPz9/ubvnhw1095hMQF/gZYIf7i8QLCprgQeAk4HrgOuBe4GHQz9PBTYBvSKNDZdHXl6et1ZBQUFUlrnh6bd8yI2z/dxfL/BFbyyLak6JFh+LNhItPhZtdPT4WLTR0eNj1UYdoMAj+KyP5eGvacAqd/8DwcJyaagQ3A6sBM4DFgEFgLv7bcB4YJ27F0YaG8PtaTc/vmAUQ3tnsm5XMY+tKo53OiIirZYcw7ZeAK4ws/mhdn8GHACeBsqBV9x9mZmtBKaa2UKgCrgmtPyMFsR2KN3SkrnnspO55PeLeHFTKU8t3cFlE46Od1oiIi0Ws6Li7h8AU5r40SmN4qqAK5tYfm6ksR3RSYN7cNuFY7nh76v40XNrGNK7G6eN6B3vtEREWkT9VBLIF04ZzLTjMqmudb7x+HK27y+Jd0oiIi2iopJgrjwxiynHH8XB0iqufmQZReVV8U5JRCRiKioJJmDGPZeNY2S/LDbvLeFbT7xJdU1tvNMSEYmIikoCykpP4Y9fyad3t1QWbtir+9uLSIehopKgBvfK5PdfyiM1kMQji7bx+BvqcS8iiU9FJYGdMrQXP/98cDizW55/m0Wb9sU5IxGR5qmoJLhL8gbx9cnDQ1eErWDrPl0RJiKJS0WlA7jh3OP51An9OFRWxVcfWcahUl0RJiKJSUWlAwgkGb+5bBzH989iy74Srn1iBVW6IkxEEpCKSgfRPS2ZP34lnz7dU3l90z5+NnttvFMSEfkYFZUOZFDPTB4IXRH22OLt/HnxtninJCLyESoqHUzekF784uLgFWEzZ61l5S7dilhEEoeKSgf0+fGD+OZZI6ipdW577QDf/dtKduwvjXdaIiIqKh3V96aO5L9PH4YZ/GPF+0y5ez4/eHY1Hxwqi3dqItKFqah0UElJxo/OH8W9n+7DxeMHUevOE2/sYPKd8/nprLXsLdZhMRGJPRWVDq5/92Tu/sJJvPw/k/nsiblUVtfy0H+2cuYv53HHv9dxsLQy3imKSBeiotJJHHNUd+6/fDwvfPsMPnXCUZRV1fC7+Zs544553DNnI8UaQl9EYqDFRcXMPmFm6dFIRtpu1IBs/viVU3j2m5M449g+FFdU8+s5Gzjzl/N4YMFmKqo93imKSCcW0e2EzexVd/+kmd0EnAUUAV+IZmLSNicf3ZM/f3UiS7bs566X1lOw/QC3v7gOgJR/vkBKIInU5KTgY+h5aiCJlGQLPobmVZUWk79vPcf2685x/bIY3rcbacmBOG+diCSqSO9RXzcmyGcIFpV5UclG2t2pw3vz9DWnsWDDXn49ZyNvvXuQqhqnqqaG0sqaiNax5P1N9c8DScaQ3pkcd1QWx/brzrH9sjiuX3eG9+lOarKOpop0dZEWlWozuw9Y6O5uZvqq2oGYGWeNPIqzRh5FQUEBJ44bT2VNLVXVtVTW1FIZeqwKPa+qqaWiupaqGufNt9dTndmXDbuL2bjnMNv3l7Blb3D699sfthFIMob2zuSotGpOO7iRUbnZjBqQTW5OOmYWv40XkZiKtKhcBXwS+Gvo9Z+ik45Em5kFD3UlJ0Fa+PjuxTvIyxtZ/7q8qobNew+zcfdhNuwuZsPuw2zcU8yOwlI27y1hM7D4vQ318T0yUzihf7DAnJCbzajcbI45Sns1Ip1VREXF3XcBjzd4/Ui0EpLElp4SYPSAHEYPyPnI/LLKYLF5aclqytJ6sfaDItZ+UMTB0ioWb9nP4i3762NTAsaxR2VxQm42YwZmc2xAIy6LdBaRnqj/k7t/9UivRTJSA4wZmEPF0Azy8kYB4O7sKipn7c4i1u4s4p1dwcdt+0vri87fV8AJfVKYPcEJJOkwmUhHF+nhr2GNXo9sMkqkATMjNyeD3JwMPnlCv/r5hyuqWR8qMPfO3cQ7+yp4dNE2rj698dtMRDqaZouKmZ0PXAwcb2YPhWb3JnhJsUirdE9LJm9IL/KG9KJ/TgZfe6yAX760jrOPP4phfbrFOz0RaYNwZ0uXAo8Cu0OPjwK3AtOinJd0EeeM6seZR6dTXlXLDc+8RW2tOmeKdGTNFhV33+Pu84Gn3H1BaFrm7tWtaczMJpjZi2Y218y+Gnq90MwWm9ldoZiAmd1vZgvMbJGZjW2wbESx0rFcfXI2fbPSWLbtAI8s2hbvdESkDSI9p/I7M7sYqD824e6PtaQhM8sC7gGmufve0Ly3gAvcfUeoYEwCBgPJ7j7ZzKYAdwNTgQdbECsdSFZqEj+/aKwOg4l0AuYe/nCDmS0BlgAH6ua5+09a1JDZecDPCB5KywL+CFwPTCR4SG0y8CQwHHgV2AncDIwLTa9GEuvuQ5poewYwAyA3Nzdv1qxZLUm9XmlpKZmZmVFdpqvFN1zmnjcOsnBHOSf0SeGnZ/Ui6QidJhNtG/S+aP/4RMwp0eJj1Uad/Pz85e6eHzbQ3cNOwLxI4sKs4zLggdDzvsBWYC3wAHAycB3BInMv8DDwQyAV2AT0ijQ2XB55eXneWgUFBVFfpqvFN1zmQEmF59/6ig+5cbb/6bUtccspltus+Pi10dHjY9VGHaDAI/isj7Rb81tm9mMzO7NuinC5hjYCdXsRlUBJqBDcDqwEzgMWAQWAu/ttwHhgnbsXRhrbirwkQfTITOXnFwVPi/3ypXVs3VcS54xEpKUiPaeSE5rqOhI4sLAlDbn7cjNbb2aLgCqCexoOPA2UA6+4+zIzWwlMNbOFobhrQquY0YJY6aDOGdWPC8cN4LmVO7nhmbf464zTSFKnSJEOI9JhWq5qj8bc/fomZp/SKKYKuLKJZedGGisd28xpo/nP5v31V4OpU6RIxxHR4S8z22pmWxpMy6OdmHRdOgwm0nFFVFTcfZi7D3f34QR72C+JblrS1dUdBlOnSJGOpcXjj7v7mwSv3hKJqpnTRtOnuzpFinQkkR7+mhfqBT/PzJYCrepRL9ISwcNgYwAdBhPpKCLdU5lO8EZd04Hz3P3yaCUk0tDU0f11GEykA4m0qOwBLgS+B0zT7YQllnQYTKTjiLSoPAr0BGYTHBrlgahlJNJI48NgO4t19FUkUUVaVPq6+0x3f8ndfwSMiGZSIo01PAx237JDVNXoFsQiiSjSolJjZsMBzGwY0LoRyUTaYOa00fTPTmf9/ip+8aJG5BFJRJEWle8CfzGzTcAzodciMdUjM5X7rxhPwOBPr2/lhdUfxDslEWkkbFExsyeANe4+yd2PIThUyteinplIE/KG9OTLJ2UBcMMzq9iy93CcMxKRhiLZU8l19/oD2KHng6OXkkjzPntMJp8dm8vhimq++fgKyipr4p2SiIREUlRSzKz+HIqZpRO8yZZIXJgZv7h4LMP7dGPdrmJ++Nzqunv2iEicRVJUfgPMMbNvm9l1BO+0+EhUsxIJIys9hd9eOZ70lCT+seJ9nlr2brxTEhEiKCru/gzwFYJDsyQB17v7b6OdmEg4x/fPrh/N+JZ/vs2a9w/FOSMRiXSU4o3u/lt3v8fdC6KdlEikPj9+EJdPPJrK6lq+8fhyDpVWxTslkS6txaMUiySaH58/ijEDs3m3sIz/fXqlxgcTiSMVFenw0lMC/O6KPLLTk5nzzh4eWLgl3imJdFkqKtIpDO6Vya//axwAd760jsWb98c5I5GuSUVFOo1PntCPb541glqH6558kz1F5fFOSaTLUVGRTuW75xzHacN7s+9wBd964k2qNfCkSEypqEinkhxI4t4vnsxRWWks3VbInS+tj3dKIl2Kiop0On2z0rjv8vEEkowHFm7hjfd1GEwkVlRUpFOaMKwXN356JAC/WnKQ38zZQEW1xggTiTYVFem0vnbGcKZPGkp1LfxmzkY+e+/rLN9eGO+0RDo1FRXptMyMmdNG85PJPRnWpxub9hzmkt8v5ubn1lBcrp73ItGgoiKd3pij0njx+jO49uwRBMz485LtnPOrhbz89q54pybS6cS8qJjZdWZWbWZDzWyCmS00s8Vmdlfo5wEzu9/MFpjZIjMbG5ofcaxIY+kpAf7v3OOZdd3pnDS4B7uKypnx5+V84y/L1Z9FpB1ZLO9DYWZDgQeADODLwPPABe6+w8wWAjcRvAHYFHf/uplNAW5y96lm9laksU20OwOYAZCbm5s3a9asVuVfWlpKZmZm+MA2LNPV4uORU407/95UyhOrD1Ne42SmGF86MYtPDcsgyaxTbnNHi0/EnBItPlZt1MnPz1/u7vlhA909JhNgwCzgOGA+cDKwAkgB7gCWANcD9wDTgHyCRWc70DPS2HB55OXleWsVFBREfZmuFh+LNo4U/25hiU9/6A0fcuNsH3LjbL/094t8057iTr3NHSU+Fm109PhYtVEHKPAIPutjefhrBjDP3TeEXhuQDtwHPAU83mD+RcC5wKVAVQtjRSIyqGcmD00/hXsuG0fvbqks3VrIZ+55jWfXHdadJEVaKZZF5QLgQjObD4wDfgucANwOrATOAxYBBYC7+23AeGCduxcCqZHExnB7pBMwMz43biBzvjuZS/IGUVldy19WH2bWqg/inZpIh5Qcq4bc/fy656HCMh0YDjwNlAOvuPsyM1sJTA2dN6kCrgktNqMFsSIt0rNbKnddehInDcrh5uff5qez1jL5uL7kZKTEOzWRDiVmRaUhdz8r9HQbcEqjn1UBVzaxzNxIY0Va64qJQ3j89Q2s21/BL/+9jtsu0gWFIi2hfioiDSQlGV/PyyY5yXhi6Q6Wbz8Q75REOhQVFZFGjs5JYcaZw3GHHz67mioNny8SMRUVkSZcN+VYBvfKYN2uYh56fWu80xHpMFRURJqQkRrgZ58bAwQHo3y3sDTOGYl0DCoqIkdw1sijOP/EXMqqarjln2+r74pIBFRURJrx4/NHkZWWzNx1e/j3Gg1AKRKOiopIM47KTueGzxwPwMxZb2vIfJEwVFREwrhiwtGMG9yD3UUV3P3yhvALiHRhKioiYSQlGT+/aCyBJOPRxdtY9d7BeKckkrBUVEQiMGpANl89fRju8INnV1OtvisiTVJREYnQdz51LAN7ZLDm/SIeXbw93umIJCQVFZEIZaYm85NpowH41cvr+eBQWZwzEkk8KioiLfCpUf349Oj+lFTWMPOfb8c7HZGEo6Ii0kK3TBtFt9QAL729m1fW7o53OiIJRUVFpIVyczL43rkjAbjl+TWUVeukvUgdFRWRVvjyaUMZOzCHnYfK+evbh+OdjkjCUFERaYVAqO9KksG/Npbyq5fXc6CkMt5picSdiopIK40dlMM3zhpBrcO9czfxiTvmcuvstew6VB7v1ETiJi63ExbpLP7v3OPp7wd49YMA89fv5Y+vb+Wxxdu5OG8Q10wezpDe3eKdokhMaU9FpI1G9U3lkasmMPu60/ns2Fyqamt5cukOzr5rPtc/9SbrdhXFO0WRmFFREWknYwbmcP8V45nz3clcmjeIJDOeX7mTT//mNf770QLe3KH73Uvnp8NfIu1sRN/u3HnpSXznnON4cOEWnly6gznv7GbOO7uZNKI33zhrBKbLkKWTUlERiZKBPTKYOW001559DA//Zyt/XrydRZv3s2jzfgCyXniJ/jnpwSk7/ePPs9Pp1S0VM4vzlohETkVFJMr6ZqVxw6eP5+uTR/CXJdv5+4r3eG9/CcUV1RTvOczGPUfu55KanES/7DSSa6votXQRGSkBMlIDwcfQ8/SUAJmheemhx5I9FRxTVkVORkoMt1RERUUkZnIyUrj27GO49uxjKCgo4JhRJ/LBoXJ2FZWz61A5HxwqZ/ehcj4oCj0eKqOovJp3C4MDV2492LJzMrcseJmje2UyekA2YwbmMHpANqMH5NA3Ky0amycCqKiIxIWZ0SMzlR6ZqZyQm33EuNLKanYXVVCwcjVDjzmOssoayqpqKK+qqX9eWvnx12u272FHUQ07CkvZUVjKi2t21a+zX3YaowfkMGZANqNDxcbdY7HZ0gWoqIgksMzUZIb1SaawZwp5Q3tFvNzy5cs5adzJbN5bwpr3D/H2ziLW7DzE2p1F7C6qYHfRHuau2/PRhZ75FwB1p3AansmpO69TNy87zbimZDNXnjqEzFR9jMiHYvZuMLNM4A/AQCAT+DKQA9wFpAD/cffvmVkAuBcYE5r/dXdfbWYTIo2N1TaJJLLkQBIj+2cxsn8WF+cF59XWOjsKS1mzM1RoQgWnsMEQM3U7LR/Zd2m0J1NY5vz8hXU8sGALXztzOF86dQjd0lRcBCyWu71mNsLdN5vZDGA0cBZwgbvvMLOFwE3AYGCKu3/dzKYAN7n7VDN7K9LYJtqdAcwAyM3NzZs1a1ar8i8tLSUzMzOqy3S1+ETMqStuc0lJCZmZmfWF5KMF5aPzHFi+o5jnNlexsbAKgOxUY9rIbnz6mEwykj/e/a0z/I4SLT5WbdTJz89f7u75YQPdPeYTMBO4BVhBcA/jDmAJcD1wDzANyAeeB7YDPSONDdd2Xl6et1ZBQUHUl+lq8bFoI9HiY9FGLOJra2t9/vo9fuH9r/uQG2f7kBtn+7ifvOT3zd3oxeVVbVp/a3PqSvGxaqMOUOARfL7HvEe9mV0InAz8HkgH7gOeAh6vCwEuAs4FLgWqQvMijRWRGDAzJh/Xl398YxKPXT2BvCE9OVBaxZ0vref0O+Zy/7xNFJfrX7KriWlRMbOrCX74X+ruu4FU4HZgJXAesAgoANzdbwPGA+vcvTDS2Fhuj4gEi8uZx/XlmWtO4y9fnUj+kJ4cDBWXM345j/vmbqS0SiMIdBWxPFGfDzwI/Ad42cwqCZ7neBooB15x92VmthKYGjpvUgVcE1pFS2JFJMbMjNOP7cMnjunNos37uWfORpZuK+SulzeQGoB+C+fSLTWZrPRkuqUl073B1C3to/O7pQXYsquC4vV7mmynseQkI1Cjy6ITQcyKirsXAIEmfnRKo7gq4Momlp8baayIxI+Z8Ylj+jBpRG8Wb9nPb+ZsZOnWwvpOnC3y2rKIQ4f1SGbWSdV011VocaXfvohEhZkxaUQfJo3ow/xFyxg2chSHK6o5XF5NSWU1xeXVlFTUcLiiisMVNcH5FdUcrgj+/ODBQ2Tn5HxknX6Eq1U37C5m68EKrn18BX/6Sj7JAQ3AHi8qKiISdVlpSS2+Ydny5cvJy8uLKHbbvhKm3buABRv2cvPzb/Pzi8ZoIM44UTkXkQ5vaJ9u3HR6T1KTk3hy6Q4eWLgl3il1WSoqItIpjOydym/+axwAv3hxHbNX7YxzRl2TioqIdBrnjc3lB+cdD8B3//YWBdsK45xR16OiIiKdytfOGM6Vpx5NZXUtX3usgG37SuKdUpeioiIinYqZMfOC0Zw9si8HSquY/vDSjwyYKdGloiIinU5yIIn7Lh/P6AHZbNtfyozHCiivqol3Wl2CioqIdErd0pJ5aPop5OakU7D9AN97+i1qa9XrPtpUVESk0+qXnc5D00+he1oys1d9wJ0vr493Sp2eioqIdGon5Gbz2yvGE0gyfjd/M0+8sSPeKXVqKioi0umdeVxfbrtwDAA3P7+G+U0MVCntQ0VFRLqEyyYczbVnj6Cm1rn28RWs3FXB+wfLKKvUCfz2pLG/RKTL+N9zRvJuYRn/fGsnP3vtAD97bS4AaclJ9MxMpUdmCj0zU+nV7cPndY+BomoiG4msa1NREZEuIynJuPPSE8nOSOa1d3ZS4ckcKK2korqWXUXl7CoqP/KyBpXd3+ULpwyOYcYdj4qKiHQpackBbr0nonTyAAALxElEQVRwLMsHV9aPglxWWUNhaSUHSio5WFpFYWklB0srOVBSxYHSSt47UMacd3Zzw99XsfdwBd88a4RGQT4CFRUR6fIyUgMMTM1gYI+MI8bc9tfX+OPKIu58aT17iyv48fmjSEpSYWlMJ+pFRCLw6WMyuf/y8aQGknhk0Tau/+tKKqtr451WwlFRERGJ0Hljc3nkqmBnyllv7eTqR5ZxuKI63mklFBUVEZEWmHRMH56acSp9uqfx+qZ9fPEPS9h3uCLeaSUMFRURkRYaMzCHv3/jNIb0zmT1+4e45HeL2LG/NN5pJQQVFRGRVhjSuxvPXDOpfiTki3+/iLd3Hop3WnGnoiIi0kp9s9J4asapTBrRm73FFVz2wBIWb94f77TiSkVFRKQNstJTePiqU/jsibkUV1TzlYeW8uLqD+KdVtyon4qISBulJQf4f5edTJ9uqTy6eDvffGIFXxqbRWq/Q+RkpJCTkUJWenKX6NeioiIi0g6SkoyZ00bTNyuNu17ewGOrinls1ev1PzeDrLRkskNFpvFUVHiYN4o3kWRGkkGSGdbgeZIFb5VsodcBM3a9X0Zx9z1kpSfTPS2F7unJZKUn0y01mUCcCpiKiohIOzEzvjXlWAb3yuSPc9dSE0jnUFkVRWVVFFdUU1QenN47UNb0Cta04iZiS5Y1ObtbaoDu6cl0T0ume3oK2enJ5FBKXpRHxezwRcXMAsC9wBggBfi6u6+Ob1Yi0pV9btxABtXsqh9bDKC6ppbi8mqKyqs4VPbxadO29+jbrx/u4O7UOtS646HH2tA8d6e2FmrceX/3XpIzsjhcUc3h8uoPHyurKamsoaSyht182IdmZO+UqG+7uXfsezab2X8BU9z962Y2BbjJ3ac2ipkBzADIzc3NmzVrVqvaKi0tJTMzM6rLdLX4RMxJ2xz/+ETMKdHim1um1p3yaqesyimtdsqqaimtcry6gpMHZbeojTr5+fnL3T0/bKC7d+gJuAeYBuQDzwPbm4vPy8vz1iooKIj6Ml0tPhZtJFp8LNro6PGxaKOjx8eqjTpAgUfwmdwZLik24CLgXOBSoCq+6YiIdF2doagUAO7utwHjgXVxzkdEpMvq8CfqgSeBqWa2kOBeyjVxzkdEpMvq8EXF3auAK+Odh4iIdI7DXyIikiBUVEREpN2oqIiISLvp8J0fW8rM9gLbW7l4H2BflJfpavGxaCPR4mPRRkePj0UbHT0+Vm3UGeLufcNGRdKZRVN9R8uIOv+0ZZmuFp+IOWmb4x+fiDklWnys2mjppMNfIiLSblRURESk3aiotMwfYrBMV4uPRRuJFh+LNjp6fCza6OjxsWqjRbrciXoREYke7amIiEi7UVEREZF2o6IiIiLtRkWlAzOz3vHOQRKHmU02swERxA1t8NxasP7cJua17jaC7cTMWjQobkvjm1g+0Jbl21tbtycaVFQiYGafNLOVZvZO6PWtzcTe1tzrIyxzrpn9y8zm1k1h4qeZ2RZgtpmtNrMzw8RfYGazQ+ueF279R1jHn8L8fEro8Wgze9DMPtHeOZlZq9+vR8rfzB42s4eamiJY57yGf7MI/m4vm9mrkW6zmV3X6PW1YVK6GLgvtO4XzezpI8Q92uD5q2HW2dA8M/uVmR0Tyue7wDoz+/6RFjCzy83sHTPbZGYLzWx0cw2Y2RUN4l8LFw+sN7N7zGxchNuwLrQNYyIJNrN7GjzvBswOE39/pO8nMzsu9Hh0o2mwmTV5M3kzm2Nm/ULPBwMLI9iG28ysR4PXvwi3TJtEu3dlZ5iAJUBvYF7o9SvNxM5t9HpBBOtfAZwMLAYmA3dHkE/P0PM+wMIw8auBccCQuqmZ2PMI3k3zyw2mrwBrw7SxIPT4GHB5uO0GVkWaUyj+V8DqCH6XX25iOmL+od/35NDvqO75XcBPI2hrSIPpIuDXEWzDhS3Y5vnNvT7CMkOB7wOvAPcfIeYNIKOp92uYde8HpgP/Cr3+N5DV3N8aWA7khJ4fD8wJ00bD+OMiiE8N/e6fDv3/XAtkNhOfBnwB+CfwOjAjzPpvJ3jL8t7Af4D/DhO/tNH76Pa6103E7gL6AQeAucC8BtM64Komljk7tJ3/B7wJnBPB3+1dYBlwckv/5q2ZEm7XKUFVuPt+M6u7/rp744DQ3sLZwFAz+3Fodm8iu2dNkbu/aWZl7r7AzH4YJr7c3Q8AuPu+CHbJ3wU2uHtpBLn8hOAb8CfAwwQLDEB6mOVqzCwT6O/uT5jZjAhy2ujuJRHkBJDv7mMjiGucd50m83f3BQBmtr/uObDAzF4K15C7NxxDbruZXRpmkdOAXsC0BvOubiY+xcyy3b3IzNIIfoCHczXBD9lfAc8eIeY3wMrQt+H+ob1eI3gH1eHNrHuNuz9iZl8KvU5z92Izq25mmZ1ADcGVr4vgcM17DeI3hNs7dfdK4Fkz2wN8C7gRuM7Mprv7kibiK4C/mdnGUPxPaabvhrt/P/T/XAB81d3D7VGXNnofvejuR9qTG+zuVWa20t2nNPxB6Pe0gOB7uaHXCH6p/B7wkLu/EiYfgI0Evwz8xcz+HEF8m6ioRGa2mc0GhpnZcwS/VTS2E9gGVPDhgJXvALdEsP7C0D/PW2b2LJATJv4NM3uE4Den0wh/C+UKYLGZLa+b4e5Nfpi5+ykAZvaOu/+0br6ZnRWmjaUEt/cboddH2n1/GHAgI5RTQbicQlaZ2WnuvjhMHh/Ju0G7Z4VZrsDM/kjwH3YYwW/AzWqwLQDdgB7NhENwb+htYGXodbhOYrcT/FuvBE4k+I25uXx6AP8ItXM18DuaKKbu/iTBO6ZiZvPc/ewwedT5wMz+DfQxs7uAsaFDKc194agBFoX+zhZa5qFQHvV/7wa/y26N4ps9Z2NmPwCuANYDD4ae9weeAj52WNjMbgK+SPBLzUMc4U6xZjaPD/8+BhwGbjazHzUuAI28Gdq+/xB8H2UeKdCDNxg80s+qzayyiR+9AfwVGAj8xMwWuPvkZvKBYH/EHWb2SeBeYFKY+DZR58cIhY7tjgY2u/vyZuJ+7e7/08o2AsAZBA/z7A8T+xlgDLAJeM6b+UOa2cfedA2+TUWaWx93b3Z0UzOzujzMrK+7740kl0hyMrM3gRMIFu9IvlW3Jv9zgZMIHpb4h7sfDhPfcFtKgJXufsRv7WbW+AuGN1UAGy2TQ/Aw0A533x0mdi7Bb6WbgM0E9wRXh1nm2+5+b3MxDWIDBD/M9gMTgLXAZwkOUrjqCMtE9PduJm6Pu7/TTE7fBx52912N5t/e1B5CqAg94u47j7TOUNyQZvJudpRzMzsPGAvsAZ5x9+Iw8YPd/d0m5k9pvGdkZhPd/Y0Gr89192b3qs1shLtvbvB6vLuvaG6ZtlBRERGRdqOrv0REpN2oqIiISLtRURFJUBbsa7LNzKbHOxeRSKmoiLSRmaWb2aOhjmnzzeyL7bFed/8M8Eh7rEskVnRJsUjbfRooc/dP1c0wswsI9plIIngV0CXAjwhePTWWYAe2Me5+jpnNJ3hJdj7By1g/d6Qrzyw4xMrvCV6quhP4Uqivw7cJ9qg34EF3j3p/BJGmaE9FpO1eB8aY2c/MbFho3ovufrq7TyL4fzY+NH8tsIFgJ7aGX+rWh/o/LAc+30xbdwI/dvczCXYUvCQ0/1rgMnc/UwVF4kl7KiJtFBrV4AxgCvC7UG/8ZaGe2CkE+9fUdYJ7k2BfmBV8tNd/3Rhcmwl23juSMcAvLTgOZHc+7Gh7FfBgqMPczCP1GxGJNhUVkXYQ6vT5amjIk2cIjvH1HWAN8K8WrGoyzZ9H2Qzc0rgDrrsvAs634ECevwY+2YI2RdqNiopIG5nZ5cAMgkOSGMEBHUcBfyJ43qO5sbHq/NPMDgGvu/vLZtaX4CCJQ4FyM5vs7lcRPE/zWzOrBaqA77j7WjN7PdR+MsFxv0TiQj3qReIsdKJ+urtvi3MqIm2mE/UiItJutKciIiLtRnsqIiLSblRURESk3aioiIhIu1FRERGRdqOiIiIi7eb/A728ou6NGIJ1AAAAAElFTkSuQmCC\n"
     },
     "metadata": {
      "needs_background": "light"
     }
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0xc1a03c8>"
      ]
     },
     "metadata": {},
     "execution_count": 9
    }
   ],
   "source": [
    "from nltk.corpus import gutenberg\n",
    "\n",
    "raw = gutenberg.raw('melville-moby_dick.txt')\n",
    "fdist = nltk.FreqDist(\n",
    "        ch.lower()\n",
    "        for ch in raw\n",
    "        if ch.isalpha()\n",
    ")\n",
    "print(\"fdist.most_common(5)= \", fdist.most_common(5))\n",
    "\n",
    "char_list = [\n",
    "        char\n",
    "        for (char, count) in fdist.most_common()\n",
    "]\n",
    "print('char_list= ', char_list)\n",
    "fdist.plot()"
   ]
  },
  {
   "source": [
    "### 3.2.4 访问子字符串（字符串切片）"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "monty[6:10]=  Pyth\nmonty[-12:-7]=  onty \nmonty[:5]=  Monty\nmonty[6:]=  Python!\n"
     ]
    }
   ],
   "source": [
    "print(\"monty[6:10]= \",monty[6:10])\n",
    "print(\"monty[-12:-7]= \",monty[-12:-7])\n",
    "print(\"monty[:5]= \",monty[:5])\n",
    "print(\"monty[6:]= \",monty[6:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "found \"thing\"!\nmonty.find('Python')=  6\n"
     ]
    }
   ],
   "source": [
    "phrase = 'And now for something completely different'\n",
    "# in 操作符\n",
    "if 'thing' in phrase:\n",
    "    print('found \"thing\"!')\n",
    "# find() 函数\n",
    "print(\"monty.find('Python')= \", monty.find('Python'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "sent slice=  sentence\n"
     ]
    }
   ],
   "source": [
    "sent = \"my sentence...\"\n",
    "print('sent slice= ', sent[3:11])"
   ]
  },
  {
   "source": [
    "### 3.2.5 更多字符串操作函数：表3-2 P99"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "source": [
    "### 3.2.6 链表与字符串的差异"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "query[2]=  o\nquery[:2]=  Wh\nquery + \" I don't\"=  Who knows? I don't\n"
     ]
    }
   ],
   "source": [
    "query = 'Who knows?'\n",
    "print(\"query[2]= \", query[2])\n",
    "print(\"query[:2]= \", query[:2])\n",
    "print('query + \" I don\\'t\"= ', query + \" I don\\'t\")  # 字符串与字符串相加\n",
    "\n",
    "# query[0] = 'F'  # 字符串不可以修改初始值\n",
    "# del query[0]  # 字符串不可能删除里面的元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "beatles[2]=  George\nbeatles[:2]=  ['John', 'Paul']\n"
     ]
    }
   ],
   "source": [
    "beatles = ['John', 'Paul', 'George', 'Ringo']\n",
    "print(\"beatles[2]= \", beatles[2])\n",
    "print(\"beatles[:2]= \", beatles[:2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "beatles + ['Brian']=  ['John', 'Paul', 'George', 'Ringo', 'Brian']\n"
     ]
    }
   ],
   "source": [
    "# print(\"beatles + 'Brian'= \", beatles + 'Brian')  # 链表不能与字符串相加\n",
    "print(\"beatles + ['Brian']= \", beatles + ['Brian'])  # 链表与链表相加"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "beatles=  ['John', 'Paul', 'George', 'Ringo']\nbeatles=  ['John Lennon', 'Paul', 'George', 'Ringo']\nbeatles=  ['Paul', 'George', 'Ringo']\n"
     ]
    }
   ],
   "source": [
    "print(\"beatles= \", beatles)\n",
    "beatles[0] = 'John Lennon'  # 链表可以修改初始值\n",
    "print(\"beatles= \", beatles)\n",
    "del beatles[0]  # 链表可以删除里面的元素\n",
    "print(\"beatles= \", beatles)"
   ]
  }
 ]
}